Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
debakarr
GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Simple Linear Regression/[Python] Simple Linear Regression.ipynb
1009 views
Kernel: Python 3

Simple Linear Regression

from IPython.display import Image
Image("img/01.png")
Image in a Jupyter notebook

  • b0 is constant representing the base salary of anyone who come to profession and have no experience i.e. Experience = 0

  • b1 is coefficient representing the slope. The more experience the more raise will be their in salary.

Here in the graph, the black line is Best Fitting Line


Image("img/02.png")
Image in a Jupyter notebook

Actual value vs Model value and Ordinary Least Square

Image("img/03.png")
Image in a Jupyter notebook

Data Preprocessing

# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression %matplotlib inline # Importing the dataset dataset = pd.read_csv('Salary_Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 1].values # Splitting the dataset into the Training set and Test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42) # Feature Scaling """from sklearn.preprocessing import StandardScaler sc_X = StandardScaler() X_train = sc_X.fit_transform(X_train) X_test = sc_X.transform(X_test) sc_y = StandardScaler() y_train = sc_y.fit_transform(y_train)"""
X_train
array([[ 4. ], [ 1.1], [ 2.2], [ 5.1], [ 2.9], [ 4.1], [ 4. ], [ 7.9], [ 1.3], [ 1.5], [ 9. ], [ 2. ], [ 7.1], [ 9.5], [ 5.9], [ 10.5], [ 6.8], [ 3.2], [ 3.9], [ 4.5], [ 6. ], [ 3. ]])
X_test
array([[ 9.6], [ 4.9], [ 8.2], [ 5.3], [ 3.2], [ 3.7], [ 10.3], [ 8.7]])
y_train
array([ 56957., 39343., 39891., 66029., 56642., 57081., 55794., 101302., 46205., 37731., 105582., 43525., 98273., 116969., 81363., 121872., 91738., 54445., 63218., 61111., 93940., 60150.])
y_test
array([ 112635., 67938., 113812., 83088., 64445., 57189., 122391., 109431.])

Fitting Simple Linear Regression to the Training Set

regressor = LinearRegression() # Object of LinearRegression class
regressor.fit(X_train, y_train) # Method for make the machine learn the correlation
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Predicting the Test set result

y_pred = regressor.predict(X_test)
y_pred
array([ 115439.88180109, 71396.10622651, 102320.45928951, 75144.51265839, 55465.37889103, 60150.88693088, 121999.59305688, 107005.96732936])

Visualising the Training set results

  • X = Years of Experience

  • Y = Salary

plt.scatter(X_train, y_train, c = 'red') plt.plot(X_train, regressor.predict(X_train), c = 'green') plt.title('Salary vs Experience (Training Set)') plt.xlabel('Years of Experience') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

Visualising the Test set results

plt.scatter(X_test, y_test, c = 'red') # We don't need to change X_train by X_test in plot as our regressor is unique plt.plot(X_train, regressor.predict(X_train), c = 'green') plt.title('Salary vs Experience (Test Set)') plt.xlabel('Years of Experience') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook